Deepfake Detection Challenge: A bronze solution (yet)
A bronze medal solution for Kaggle Deepfake Detection Challenge
Just about 4 months ago Kaggle started hosting a very interesting competition with a prize money of $1,000,000: Deepfake Detection Challenge. Although it is very tempting to try to get this kind of prize, for me it's always about learning when it comes to Kaggle competitions. Unfortunately, I joined the competition pretty late, about when only a month left, but still I tried to give 100% to see how much I can achieve and learn. The competition now ended but final results on the Private Leaderboard will be revealed once the participant models are evaluted on a hold-out set by Facebook.
AWS, Facebook, Microsoft, the Partnership on AI’s Media Integrity Steering Committee and academics came together to build this challenge by providing a dataset of ~100K (~500 GB) real and fake videos. First, I would like to thank all these organizations and individuals for creating this challenge and Kaggle for hosting it to let talented people in the field to work on such an important problem for our society.
Without a doubt, Deepfakes and similar content generation and manipulation adversery methods are great threats to everyone. It can have significant implications in terms of quality of public discourse and the safeguarding of human rights. Misinformation can lead to dangerous and even fatal outcomes. These kind of threats not only appear in computer vision but also in NLP. For example, Open AI's gigantic GPT-2 model had similar controversies about advarsarial risks and the actual model trained by the team was not initially released for this same reason until some time.
"These samples have substantial policy implications:large language models are becoming increasingly easy to steer towards scalable, customized, coherent text generation, which in turn could be used in a number of beneficial as well as malicious ways".
We're releasing the 1.5billion parameter GPT-2 model as part of our staged release publication strategy.
— OpenAI (@OpenAI) November 5, 2019
- GPT-2 output detection model: https://t.co/PX3tbOOOTy
- Research from partners on potential malicious uses: https://t.co/om28yMULL5
- More details: https://t.co/d2JzaENiks pic.twitter.com/O3k28rrE5l
As AI techniques evolve, people will not stop using them for harmful purposes but this shouldn't enforce any barriers for technological development but rather allow everyone in the community to contribute to the fight against them through transparency.
Those who are further interested and concerned about implications of AI in ethics and society can check out Fast.ai's blog series in ai-in-society.
The goal of the competition was to be able to detect real and fake videos. It can be simply framed as a video classification task. The provided training dataset had only the binary indicator of fakeness for each video and no other information beyond it. It was also stated that the fake videos can have visual, audio or both kinds of manipulations. The technical details of the video manipulation methods were not publicly mentioned. The reason was not to defeat the purpose of building a robust and general deepfake detection model.
Logloss was selected as the evaluation metric. It can be considered as a better choice compared to accuracy as it also captures the confidence of the predictions made.
$$\textrm{LogLoss} = - \frac{1}{n} \sum_{i=1}^n \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)\right]$$
The dataset was created with the help of volunteer actors and their self recorded videos. Then each of these videos were manipulated with different deepfakes methods. Each original video had multiple different fakes which corresponds to a particular method.
Here, let's display an original video and it's fake. In some cases, the differences may be very subtle to human eye.
Tip: look closer to left eye from in both videos. You may also notice minor generation artifacts in the video on the right.
After exploring the dataset and reading discussions in Kaggle forum, it became more clear that various facial manipulation techniques were employed in terms of quality and localization. It should also be noted these deepfakes methods are far from perfect, so it introduces an additional noise to the training set. More detailed information on a preview version of the dataset is available here in this paper.
You may see an example batch of face manipulations. In these pairs of face crops; left corresponds to the original and right correponds to the fake video frame.
